Search CORE

28 research outputs found

Optimistic Concurrency Control for Distributed Unsupervised Learning

Author: Broderick Tamara
Gonzalez Joseph E.
Jegelka Stefanie
Jordan Michael I.
Pan Xinghao
Publication venue
Publication date: 01/01/2013
Field of study

Research on distributed machine learning algorithms has focused primarily on one of two extremes - algorithms that obey strict concurrency constraints or algorithms that obey few or no such constraints. We consider an intermediate alternative in which algorithms optimistically assume that conflicts are unlikely and if conflicts do arise a conflict-resolution protocol is invoked. We view this "optimistic concurrency control" paradigm as particularly appropriate for large-scale machine learning algorithms, particularly in the unsupervised setting. We demonstrate our approach in three problem areas: clustering, feature learning and online facility location. We evaluate our methods via large-scale experiments in a cluster computing environment.Comment: 25 pages, 5 figure

arXiv.org e-Print Archive

CiteSeerX

MLI: An API for Distributed Machine Learning

Author: Franklin Michael J.
Gonzalez Joseph
Jordan Michael I.
Kottalam Jey
Kraska Tim
Pan Xinghao
Smith Virginia
Sparks Evan R.
Talwalkar Ameet
Publication venue
Publication date: 25/10/2013
Field of study

MLI is an Application Programming Interface designed to address the challenges of building Machine Learn- ing algorithms in a distributed setting based on data-centric computing. Its primary goal is to simplify the development of high-performance, scalable, distributed algorithms. Our initial results show that, relative to existing systems, this interface can be used to build distributed implementations of a wide variety of common Machine Learning algorithms with minimal complexity and highly competitive performance and scalability

arXiv.org e-Print Archive

Crossref

Effects of different levels of dietary crude protein on the physiological response, reproductive performance, blood profiles, milk composition and odor emission in gestating sows

Author: Cheonsoo Kim
Hongjun Kim
Niru Pan
Xinghao Jin
Yoo Yong Kim
Publication venue: 'Asian Australasian Association of Animal Production Societies'
Publication date: 01/08/2023
Field of study

Objective This study was conducted to evaluate the effects of crude protein (CP) levels on the physiological response, reproductive performance, blood profiles, milk composition and odor emission in gestating sows. Methods Seventy-two multiparous sows (Yorkshire×Landrace) of average body weight (BW), backfat thickness, and parity were assigned to one of six treatments with 10 or 11 sows per treatment in a completely randomized design. Experimental diets with different CP levels were as follows: i) CP11, corn–soybean-based diet containing 11% CP; ii) CP12, corn–soybean-based diet containing 12% CP; iii) CP13, corn–soybean-based diet containing 13% CP; iv) CP14, corn–soybean-based diet containing 14% CP; v) CP15, corn–soybean-based diet containing 15% CP; and vi) CP16: corn–soybean-based diet containing 16% CP. Results There was no significant difference in the performance of sow or piglet growth when sows were fed different dietary protein levels. Milk fat (linear, p = 0.05) and total solids (linear, p = 0.04) decreased as dietary CP levels increased. Increasing dietary CP levels in the gestation diet caused a significant increase in creatinine at days 35 and 110 of gestation (linear, p = 0.01; linear, p = 0.01). The total protein in sows also increased as dietary CP levels increased during the gestation period and 24 hours postpartum (linear, p = 0.01; linear, p = 0.01). During the whole experimental period, an increase in urea in sows was observed when sows were fed increasing levels of dietary CP (linear, p = 0.01), and increasing blood urea nitrogen (BUN) concentrations were observed as well. In the blood parameters of piglets, there were linear improvements in creatinine (linear, p = 0.01), total protein (linear, p = 0.01), urea (linear, p = 0.01), and BUN (linear, p = 0.01) with increasing levels of dietary CP as measured 24 hours postpartum. At two measurement points (days 35 and 110) of gestation, the odor gas concentration, including amine, ammonia, and hydrogen sulfide, increased linearly when sows fed diets with increasing levels of dietary CP (linear, p = 0.01). Moreover, as dietary CP levels increased to 16%, the odor gas concentration was increased with a quadratic response (quadratic, p = 0.01). Conclusion Reducing dietary CP levels from 16% to 11% in a gestating diet did not exert detrimental effects on sow body condition or piglet performance. Moreover, a low protein diet (11% CP) may improve dietary protein utilization and metabolism to reduce odor gas emissions in manure and urine in gestating sows

Directory of Open Access Journals

FfDL : A Flexible Multi-tenant Deep Learning Platform

Author: Egwutuoha Ifeanyi P.
Google Inc.
Hermann Jeremy
Jia Yangqing
Kraska Tim
Pan Xinghao
Park Jun Woo
Tantawi Asser N.
Venkataraman Shivaram
Wang Chao
Xiao Wencong
Zaharia Matei
Zhang Haoyu
Zhang K.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/09/2019
Field of study

Deep learning (DL) is becoming increasingly popular in several application domains and has made several new application features involving computer vision, speech recognition and synthesis, self-driving automobiles, drug design, etc. feasible and accurate. As a result, large scale on-premise and cloud-hosted deep learning platforms have become essential infrastructure in many organizations. These systems accept, schedule, manage and execute DL training jobs at scale. This paper describes the design, implementation and our experiences with FfDL, a DL platform used at IBM. We describe how our design balances dependability with scalability, elasticity, flexibility and efficiency. We examine FfDL qualitatively through a retrospective look at the lessons learned from building, operating, and supporting FfDL; and quantitatively through a detailed empirical evaluation of FfDL, including the overheads introduced by the platform for various deep learning models, the load and performance observed in a real case study using FfDL within our organization, the frequency of various faults observed including unanticipated faults, and experiments demonstrating the benefits of various scheduling policies. FfDL has been open-sourced.Comment: MIDDLEWARE 201

arXiv.org e-Print Archive

Crossref

Recommended from our members

Parallel Machine Learning Using Concurrency Control

Author: Pan Xinghao
Publication venue: eScholarship, University of California
Publication date: 01/01/2017
Field of study

Many machine learning algorithms iteratively process datapoints and transform global model parameters. It has become increasingly impractical to serially execute such iterative algorithms as processor speeds fail to catch up to the growth in dataset sizes.To address these problems, the machine learning community has turned to two parallelization strategies: bulk synchronous parallel (BSP), and coordination-free. BSP algorithms partition computational work among workers, with occasional synchronization at global barriers, but has only been applied to ‘embarrassingly parallel’ problems where work is trivially factorizable. Coordination-free algorithms simply allow concurrent processors to execute in parallel, interleaving transformations and possibly introducing inconsistencies. Theoretical analysis is then required to prove that the coordination-free algorithm produces a reasonable approximation to the desired outcome, under assumptions on the problem and system.In this dissertation, we propose and explore a third approach by applying concurrency control to manage parallel transformations in machine learning algorithms. We identify points of possible interference between parallel iterations by examining the semantics of the serial algorithm. Coordination is then introduced to either avoid or resolve such conflicts, whereas non-conflicting transformations are allowed to execute concurrently. Our parallel algorithms are thus engineered to produce the same exact output as the serial machine learning algorithm, preserving the serial algorithm’s theoretical guarantees of correctness while maximizing concurrency.We demonstrate the feasibility of our approach to parallelizing a variety of machine learning algorithms, including nonparametric unsupervised learning, graph clustering, discrete optimization, and sparse convex optimization. We theoretically prove and empirically verify that our parallel algorithms produce equivalent output to their serial counterparts. We also theoretically analyze the expected concurrency of our parallel algorithms, and empirically demonstrate their scalability

eScholarship - University of California

City-Scale Traffic Estimation from a Roving Sensor Network

Author: Daniela Rus
Javed Aslam
Sejoon Lim
Xinghao Pan
Publication venue
Publication date: 01/01/2012
Field of study

Traffic congestion, volumes, origins, destinations, routes, and other road-network performance metrics are typically collected through survey data or via static sensors such as traffic cameras and loop detectors. This information is often out-of-date, difficult to collect and aggregate, difficult to analyze and quantify, or all of the above. In this paper we conduct a case study that demonstrates that it is possible to accurately infer traffic volume through data collected from a roving sensor network of taxi probes that log their locations and speeds at regular intervals. Our model and inference procedures can be used to analyze traffic patterns and conditions from historical data, as well as to infer current patterns and conditions from data collected in real-time. As such, our techniques provide a powerful new sensor network approach for traffic visualization, analysis, and urban planning